TALP system for the English lexical sample task

نویسندگان

  • Gerard Escudero
  • Lluís Màrquez i Villodre
  • German Rigau
چکیده

This paper describes the TALP system on the English Lexical Sample task of the Senseval-31 event. The system is fully supervised and relies on a particular Machine Learning algorithm, namely Support Vector Machines. It does not use extra examples than those provided by Senseval-3 organisers, though it uses external tools and ontologies to extract part of the representation features. Three main characteristics have to be pointed out from the system architecture. The first thing is the way in which the multiclass classification problem posed by WSD is addressed using the binary SVM classifiers. Two different approaches for binarizing multiclass problems have been tested: one–vs–all and constraint classification. In a cross-validation experimental setting the best strategy has been selected at word level. Section 2 is devoted to explain this issue in detail. The second characteristic is the rich set of features used to represent training and test examples. Topical and local context features are used as usual, but also syntactic relations and semantic features indicating the predominant semantic classes in the example context are taken into account. A detailed description of the features is presented in section 3. And finally, since each word represents a learning problem with different characteristics, a per–word feature selection has been applied. This tuning process is explained in detail in section 4. The last two sections discuss the experimental results (section 5) and present the main conclusions of the work performed (section 6).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The TALP-UPC Spanish-English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System

This paper describes the TALP–UPC system in the Spanish–English WMT 2016 biomedical shared task. Our system is a standard phrase-based system enhanced with vocabulary expansion using bilingual word embeddings and a characterbased neural language model with rescoring. The former focuses on resolving outof-vocabulary words, while the latter enhances the fluency of the system. The two modules prog...

متن کامل

The TALP ngram-based SMT system for IWSLT'05

This paper provides a description of TALP-Ngram, the tuple-based statistical machine translation system developed at the TALP Research Center of the UPC (Universitat Politècnica de Catalunya). Briefly, the system performs a log-linear combination of a translation model and additional feature functions. The translation model is estimated as an N-gram of bilingual units called tuples, and the fea...

متن کامل

Using LazyBoosting for Word Sense Disambiguation

This paper describes the architecture and results of the TALP system presented at the SENSEVAL-2 exercise for the English lexical–sample task. This system is based on the LazyBoosting algorithm for Word Sense Disambiguation (Escudero et al., 2000), and incorporates some improvements and adaptations to this task. The evaluation reported here includes an analysis of the contribution of each compo...

متن کامل

The Relationship between Syntactic and Lexical Complexity in Speech Monologues of EFL Learners

: This study aims to explore the relationship between syntactic and lexical complexity and also the relationship between different aspects of lexical complexity. To this end, speech monologs of 35 Iranian high-intermediate learners of English on three different tasks (i.e. argumentation, description, and narration) were analyzed for correlations between one measure of sy...

متن کامل

The Impact of Task Complexity along Single Task Dimension on EFL Iranian Learners' Written Production: Lexical complexity

Based on Robinson’s Cognition Hypothesis, this study explored the effects of task complexity on the lexical complexity of Iranian EFL students’ argumentative writing.This study was designed to explore the manipulation of cognitive task complexity along +/-single task dimension (a resource dispersing dimension in Robinson’s triadic framework) on Iranian EFL learners’ production in term of lexica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004